Last updated on November 9, 2025
Thanks to Grok for reviewing.
A multimodal token with physics pixel can be applied to a wide range of general purposes, and the separation between different fields (tables) like text, physics parameters, control, agent and other functionalities is crucial to let the AI model learn interior and exterior correlations of the data fields of different functions, which is just like the separation of human brain in functions.
Asymmetry XYZ is a method specifically designed for multimodal token with physics pixel to separate perspective coordinate from rectangular coordinate, in which either X or Y is 16 bits enough for pixel number like 1 to 1920 or 1080, and in which Z is 32bits in millimeter, and in rectangular coordinate system X’,Y’,Z’ all are 32bits in millimeter, which differentiate perspective and rectangular coordinate systems by different formats of vectors.
There are other methods which should have been applied in many cases to differentiate data fields of different functions including orthogonal data, prefix-suffix and vacant space.
All methods above are included in the example below, which is intended to be an example for Terminal AI like vehicle, robot or drone, and for Server AI the token may have much larger fields like 10x1028D for text.
A multimodal token with physics pixel includes following data which all are orthogonal in different dimensions (here 1D=32bits).
——/global parameters for token/——
1D: token ID,
1D: timestamp,
1D: data source, (input/external raw or output/internal generated, synthetic or not, physics complied or not, etc)
2D: mask tensor, (for text, RGBA, RGB, etc)
——/768D for text/——
TABLE HEADER {
8 bits: table ID, (“1”)
8bits: number of rows, (“1”)
8 bits: number of columns, (“1”)
}
768D: text.
32D: vacant space. (to separate text field from other fields)
——/list for sensors in this token/——
TABLE HEADER {
8bits: number of rows, (“3”, 3 types include camera, mm wave radar and microphone)
8bits: number of columns, (“6.75”D)
}
for (each of “A” sensors in this token) {
8bits: T=sensor type, (“1” means camera)
8bits: N=sensor ID, (“1” for first camera)
8bits: C=coordinate system type, (“C0” the unified rectangular coordinate system)
3D: position of sensor, (position in above “C0” coordinate system)
2D: direction in spherical coordinates, (horizontal+vertical angles in above “C0” coordinate system)
}
8D: Vacant space.
——/table for physics pixel/——
TABLE HEADER {
8bits: table ID, (“3”)
8bits: rows, (?)
8bits: columns, (?)
}
for (each of 64 physics pixels) {
??D: TNC+XYZ, (C= “01” means sensor’s own perspective coordinate system, XYZ is the position of physics pixel in perspective coordinate system of the camera, X=16bits is physics pixel‘s horizontal serial number (from 1) in the 2D frame of the camera, Y=16bits is physics pixel‘s vertical serial number (from 1) in the 2D frame of the camera, Z=32bits is distance from the physics pixel to the camera lens, if “Z=0” it means raw pixel on camera lens)
??D: RGBA, (for “X,Y,0” RGBA=RGB, and for any invisible physics pixel RGBA=”-1,-1,-1,-1”)
??D: C+X’Y’Z’, (mapping 3D point of the physics pixel, C=”C0” unified rectangular coordinate system, X’=32bits in unit mm, Y’=32bits in unit mm, Z’=32bits in unit mm,)
??D: direction in spherical coordinates, (horizontal angle+vertical angle in “C0” system, which is the direction perpendicular to the interface which is between near object and far object and which the physics pixel is on)
??D: pressure,
??D: near object ID,
??D: selected parameters of the point of near object on the physics pixel, (this field is optional, which may include temperature, velocity, rotation vector, hardness, density, material or no parameter at all)
??D: Far object ID,
??D: selected parameters of the point of far object on the physics pixel.
}
8D: Vacant space.
/table for objects in this token/
TABLE HEADER{
8bits: table ID, (“4”)
8bits: rows, (?)
8bits: columns, (?)
}
for(i=1 to 4){
object_i ID,
object class,
parent object,
object mass,
object velocity,
object rotation vector,
object eatable or not,
}
8D: Vacant space.
/table for mm wave radar/
TABLE HEADER{
8bits: table ID, (“5”)
8bits: rows, (?)
8bits: columns, (?)
}
/each table includes 10 3D points of total points generated by this mm wave radar at one time/
For (i=1 to 10) {
TNC+x’y’z’, (T=”2” mm wave radar, N=”1” first radar of the type, C=”C0” unified rectangular coordinate system, x’y’z’ all are 32bits)
velocity vector.
}
8D: Vacant space.
/table for microphone/
TABLE HEADER{
8bits: table ID, (“07”)
8bits: rows, (?)
8bits: columns, (?)
}
{
TNC+Sound signal of 1/15 sec.
}
16D: Vacant space.
/table for control/
TABLE HEADER{}
{ internal sensors,
actuators.
}
16D: Vacant space.
/table for agent/
TABLE HEADER{
……
}
1024D: vacant
Be First to Comment